85 research outputs found

    A Corpus of Sentence-level Revisions in Academic Writing: A Step towards Understanding Statement Strength in Communication

    Full text link
    The strength with which a statement is made can have a significant impact on the audience. For example, international relations can be strained by how the media in one country describes an event in another; and papers can be rejected because they overstate or understate their findings. It is thus important to understand the effects of statement strength. A first step is to be able to distinguish between strong and weak statements. However, even this problem is understudied, partly due to a lack of data. Since strength is inherently relative, revisions of texts that make claims are a natural source of data on strength differences. In this paper, we introduce a corpus of sentence-level revisions from academic writing. We also describe insights gained from our annotation efforts for this task.Comment: 6 pages, to appear in Proceedings of ACL 2014 (short paper

    Tracing Community Genealogy: How New Communities Emerge from the Old

    Full text link
    The process by which new communities emerge is a central research issue in the social sciences. While a growing body of research analyzes the formation of a single community by examining social networks between individuals, we introduce a novel community-centered perspective. We highlight the fact that the context in which a new community emerges contains numerous existing communities. We reveal the emerging process of communities by tracing their early members' previous community memberships. Our testbed is Reddit, a website that consists of tens of thousands of user-created communities. We analyze a dataset that spans over a decade and includes the posting history of users on Reddit from its inception to April 2017. We first propose a computational framework for building genealogy graphs between communities. We present the first large-scale characterization of such genealogy graphs. Surprisingly, basic graph properties, such as the number of parents and max parent weight, converge quickly despite the fact that the number of communities increases rapidly over time. Furthermore, we investigate the connection between a community's origin and its future growth. Our results show that strong parent connections are associated with future community growth, confirming the importance of existing community structures in which a new community emerges. Finally, we turn to the individual level and examine the characteristics of early members. We find that a diverse portfolio across existing communities is the most important predictor for becoming an early member in a new community.Comment: 10 pages, 7 figures, to appear in Proceedings of ICWSM 2018, data and more at https://chenhaot.com/papers/community-genealogy.htm

    Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts

    Full text link
    Understanding how ideas relate to each other is a fundamental question in many domains, ranging from intellectual history to public communication. Because ideas are naturally embedded in texts, we propose the first framework to systematically characterize the relations between ideas based on their occurrence in a corpus of documents, independent of how these ideas are represented. Combining two statistics --- cooccurrence within documents and prevalence correlation over time --- our approach reveals a number of different ways in which ideas can cooperate and compete. For instance, two ideas can closely track each other's prevalence over time, and yet rarely cooccur, almost like a "cold war" scenario. We observe that pairwise cooccurrence and prevalence correlation exhibit different distributions. We further demonstrate that our approach is able to uncover intriguing relations between ideas through in-depth case studies on news articles and research papers.Comment: 11 pages, 9 figures, to appear in Proceedings of ACL 2017, code and data available at https://chenhaot.com/pages/idea-relations.html (fixed a typo

    Urban Dreams of Migrants: A Case Study of Migrant Integration in Shanghai

    Full text link
    Unprecedented human mobility has driven the rapid urbanization around the world. In China, the fraction of population dwelling in cities increased from 17.9% to 52.6% between 1978 and 2012. Such large-scale migration poses challenges for policymakers and important questions for researchers. To investigate the process of migrant integration, we employ a one-month complete dataset of telecommunication metadata in Shanghai with 54 million users and 698 million call logs. We find systematic differences between locals and migrants in their mobile communication networks and geographical locations. For instance, migrants have more diverse contacts and move around the city with a larger radius than locals after they settle down. By distinguishing new migrants (who recently moved to Shanghai) from settled migrants (who have been in Shanghai for a while), we demonstrate the integration process of new migrants in their first three weeks. Moreover, we formulate classification problems to predict whether a person is a migrant. Our classifier is able to achieve an F1-score of 0.82 when distinguishing settled migrants from locals, but it remains challenging to identify new migrants because of class imbalance. This classification setup holds promise for identifying new migrants who will successfully integrate into locals (new migrants that misclassified as locals).Comment: A modified version. The paper was accepted by AAAI 201
    • …
    corecore